Automated Generation of High-Performance Large-Scale Matrix Multiplication Accelerator on FPGA

نویسندگان

  • Jie Wang
  • Jason Cong
چکیده

Matrix multiplication (MM) is a key linear algebra routine which has been widely used in many application areas. In this work we provide a high-performance single-precision dense MM FPGA accelerator, and also an automatic generator to generate the accelerator with high throughput and high resource efficiency based on hardware and MM workload specifications. The accelerator adopts the linear systolic array as the basic building block and contains an optimized architecture which integrates several blocks together. The size and the number of blocks are parameterized, allowing the user to search for the optimal design parameters using an automatic design space exploration. The accelerator is tested on the Xilinx VC709 evaluation board, and shows a peak performance of 198.1 GFLOPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance FPGA-Based Accelerator for BLAS Library Implementation

This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dg...

متن کامل

FPGA accelerator for floating-point matrix multiplication

This study treats architecture and implementation of a FPGA accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering...

متن کامل

Random access schemes for efficient FPGA SpMV acceleration

Utilizing hardware resources efficiently is vital to building the future generation of high-performance computing systems. The sparse matrix – dense vector multiplication (SpMV) kernel, which is notorious for its poor efficiency on conventional processors, is a key component in many scientific computing applications and increasing SpMV efficiency can contribute significantly to improving overal...

متن کامل

FPGA based dataflow accelerator for large matrix multiplication

Real-world numerical applications often require a huge number of calculations to be done in short time. The best way to speed-up these applications is to exploit a huge amount of data parallelism by parallelizing independent calculations. Multi-core processors do not have enough resources to achieve any significant utilization of available data parallelism. Instead of adding new CPUs, addition ...

متن کامل

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

Because of their high performance and flexibility, FPGAs are an attractive option for use in embedded systems, where both high performance and low energy consumption are important. Therefore, it is important to create FPGA designs that are not only high performance but also low energy. The flexibility of FPGAs facilitates their high performance, but also makes it difficult to design for them. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016